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ARTICLE SUMMARY 



Article focus 

■ Diffuse hyperalgesia may be evaluated by tender 
point (TP) examination and may reflect deficient 
descending pain inhibition as in fibromyalgia. 

■ TP examination is increasingly relevant to improve 
clinical assessment in inflammatory as well as 
non-inflammatory rheumatological disorders. 

■ Reproducibility of this examination technique is not 
well documented and was therefore investigated. 

Key messages 

■ In sick-listed chronic low back pain (LBP) 
patients, digital TP examination was a reliable 
but not precise instrument. 

■ In both women and men, there was more than 
70% agreement within ±3 TPs. 

■ The method was quick and easy to use with no 
requirements of equipment, except in initial train- 
ing sessions. 

Strengths and limitations of this study 

■ The study included a well-defined chronic LBP 
population that was referred from general practi- 
tioners for LBP examination and return-to-work 
intervention. 

■ The number of patients was limited and only two 
raters were involved, resulting in wide CIs and 
limited generalisability. 



ABSTRACT 

Objectives: To evaluate the reliability and agreement 
of digital tender point (TP) examination in chronic low 
back pain (LBP) patients. 
Design: Cross-sectional study. 
Settings: Hospital-based validation study. 
Participants: Among sick-listed LBP patients referred 
from general practitioners for low back examination 
and return-to-work intervention, 43 and 39 patients, 
respectively (18 women, 46%) entered and completed 
the study. 

Main outcome measures: The reliability was 
estimated by the intraclass correlation coefficient (ICC), 
and agreement was calculated for up to ±3 TPs. 
Furthermore, the smallest detectable difference was 
calculated. 

Results: TP examination was performed twice by two 
consultants in rheumatology and rehabilitation at 
20 min intervals and repeated 1 week later. Intrarater 
reliability in the more and less experienced rater was 
ICC 0.84 (95% CI 0.69 to 0.98) and 0.72 (95% CI 0.49 
to 0.95), respectively. The figures for inter-rater 
reliability were intermediate between these figures. 
In more than 70% of the cases, the raters agreed 
within ±3 TPs in both men and women and between 
test days. The smallest detectable difference between 
raters was 5, and for the more and less experienced 
rater it was 4 and 6 TPs, respectively. 
Conclusions: The reliability of digital TP examination 
ranged from acceptable to excellent, and agreement 
was good in both men and women. The smallest 
detectable differences varied from 4 to 6 TPs. Thus, TP 
examination in our hands was a reliable but not precise 
instrument. Digital TP examination may be useful in 
daily clinical practice, but regular use and training 
sessions are required to secure quality of testing. 



INTRODUCTION 

Tender point (TP) examination has been 
the cornerstone examination in patients with 



chronic widespread pain (CWP) to distin- 
guish fibromyalgia patients from patients 
with CWP only. In the general population, 
the former and latter conditions have been 
identified in G.5-4% 1 and 10-13%, 2 3 
respectively. Persons fulfilling the fibromyal- 
gia criteria (CWP and >11 TPs) report more 
pain and disability than persons with CWP 
who have less than 11 TPs. 4 TP examination 
is performed by standardised digital palpa- 
tion at 18 points symmetrically distributed on 
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the body (figure l). 5 In the general population, men 
and women had a median of 3 and 6 TPs, respectively, 6 
and women may have up to 4 TPs more than men. 7 

TP examination may be relevant in conditions other 
than CWP or regional pain syndromes. In inflammatory 
rheumatic diseases, TP examination may also contribute 
to the clinical evaluation. For instance, high-disease 
activity in the absence of inflammatory activity in 
rheumatoid arthritis is often seen in patients with many 
TPs. 8 This may lead to inappropriate treatment of 
disease activity. In systemic lupus erythematosus, health 
status has been shown to be inferior in patients with 
many TPs as compared with patients with few TPs. 9 

In sick-listed low back pain (LBP) patients, the inten- 
sity of back pain is associated with the number of TPs, 
and patients with radiculopathy have fewer TPs than 
patients with non-specific LBP. 10 Furthermore, TPs are 
associated with the reporting of widespread pain and 
with long-term prognosis. 11 According to another 
study, 12 patients with both CWP and non-specific LBP 
have more pain, higher disability and more TPs than 
patients with LBP only. 

Reliability and agreement studies are, however, few 
and insufficient. The original study defining fibromyal- 
gia 5 included 293 patients and 265 controls. Since then, 
we have been able to identify only three small studies 
comparing the reliability of digital palpation and dolo- 
metry with TPs defined as in the original study. 13-15 
Each study included 15-25 individuals. The reliability 
was acceptable and comparable for both dolorimetry 
and digital palpation, and k values of 0.44-0.92 were 



reported for the digital examination. However, only the 
reliability of testing each TP location as positive was esti- 
mated, not the reliability of the total TP counts. In other 
non-specific pain studies, the reliability of TP examin- 
ation was not formally tested, or digital examination was 
not used. 16 " 20 

Since the total TP count — and not each single TP — is 
used for the clinical evaluation in rheumatological con- 
ditions, more reliability and agreement studies of the 
total TP count are needed. 

Accordingly, the purpose of the present study was to 
investigate the reproducibility of total TP counts based 
on digital TP examination in chronic sick-listed LBP 
patients in terms of (1) intrarater and inter-rater reliabil- 
ity and (2) intrarater and inter-rater agreement. 

METHODS 

The patients were recruited among patients referred 
from their general practitioners to the Spine Center for 
participation in a controlled study. 

Inclusion criteria: partly or fully sick-listed for more 
than 4 weeks due to LBP with or without radiculopathy, 
LBP should be the prime reason for sick-listing and at 
least as bothersome as pain elsewhere, age 16-60 years, 
referred from a well-defined geographical area of about 
280 000 inhabitants, and the patient should be able to 
speak and understand Danish. 

Exclusion criteria: living outside the referral area, con- 
tinuing or progressive radiculopathy resulting in plans 
for surgery, low back surgery within the last year, 



Figure 1 Locations of tender 
points according to the American 
College of Rheumatology. 5 
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previous lumbar fusion operation, suspected cauda 
equina syndrome, progressive paresis or other serious 
back disease, (eg, tumour), pregnancy, known depend- 
ency on drugs or alcohol or primary psychiatric disease. 

The patients were contacted between 1 November 2009 
and 1 March 2010 and were only included in the present 
study after more than 3 weeks had passed since their first 
consultation at the Spine Center. They were offered par- 
ticipation in the study by one of the authors (JC), who 
was the leader of the project but was not a staff member, 
and they were told that the investigation had nothing to 
do with the management of their LBP. The patients were 
informed that the examination would only include meas- 
uring of diffuse tenderness by TP examination and spinal 
range of motion (not reported in this paper). Previously, 
all patients had been subjected to a clinical low back 
examination and TP examination at their first consult- 
ation at the Spine Center. 

The examinations were performed by two clinicians 
(OKJ and MGN), both consultants in rheumatology and 
rehabilitation. Beforehand, the TP examination method 
was taught by the more experienced rater (OKJ=Rater 
A) to the less experienced rater (MGN=Rater B) during 
a 2 h session. Each test day, before starting examinations, 
the two raters calibrated their thumbs with a dolorim- 
eter, 21 which was able to register four pressures at a time 
and calculate means and SDs. 

The examinations were performed during two test 
days, days 1 and 2, at 1-week intervals. To include all 
patients, the test days were repeated twice. The patients 
were randomised so that half of the patients were first 
tested by Rater A, the other half first by Rater B, but 
keeping the same sequence on day 2 as on day 1. Twenty 
minutes passed between the examinations. 

Before examination, the patients filled out a question- 
naire including questions regarding back+leg pain 22 and 
disability, 23 increasing scores representing increasing 
pain and disability. At the clinical examination, the 
patient's range of spinal motion was first measured in the 
standing position. Subsequently, the patient was asked to 
lie prone, and a 4 kg digital pressure was demonstrated 
on the distal, dorsal aspect of the forearm. The patient 
was instructed in the following way: "This is a firm pres- 
sure. Afterwards, this pressure will be applied on different 
spots on the body. At every spot, I would like you to 
report if the pressure is painful or is felt like firm pres- 
sure." The TPs (figure 1) were tested in a standardised 
manner from right to left, first testing the medial fat pads 
of the knees and the posterior aspects of the greater tro- 
chanter. Afterwards, with the patient seated, the spots 
were tested from the top and downwards as follows: the 
suboccipital muscle insertions, the anterior-lateral aspect 
of the intertransverse aspects of C5-7, the midpoints of 
the upper borders of the trapezius, the medial parts of 
the supraspinatus, the costochondral junctions of costa 2, 
the forearm 2 cm distal to the epicondyles and the outer 
upper quadrants of the buttocks. The patients were 



instructed not to tell the result of the TP examination to 
the raters or others. 

Positive TPs (eg, pressures causing pain) were mem- 
orised by the raters and summed up to the total number 
of TPs (the TP count). The procedure lasted 6-8 min 
per examination. A secretary was associated with each 
rater. The TP counts were reported to this secretary, who 
passed the data to the project leader (JC). In this way, 
the raters were blinded in relation to each other. 

The secretary also registered pain response at every 
single TP location. 

Statistical analyses 

The requirement for testing intrarater and inter-rater 
reliability was planned to include a sample size of at 
least 40 persons. 24 The TP counts were distributed as dis- 
crete numerical variables and were normally distributed. 
For the quantification of intrarater and inter-rater repro- 
ducibility of TP examination, two types of analysis were 
applied: the intraclass correlation coefficient (ICC) and 
the Bland-Altman method for assessing agreement. 25 26 
ICC provides information on the ability to differentiate 
between the variation between subjects and measure- 
ment variation. The ICC was defined as the ratio of vari- 
ance among patients (subject variability) over the total 
variance (subject variability, observer variability and 
measurement variability) . ICC ranges between 0 (no reli- 
ability) and 1 (perfect reliability), and values of ICCs 
are excellent when >0.75 and poor when <0.40. Results 
between these ranges represent moderate-to-good reli- 
ability. 27 According to another reference, ICC >0.7 is 
considered good. 25 

The Bland-Altman method provides insight into the dis- 
tribution of differences in relation to mean values. 28 
Agreement was quantified by calculating the mean differ- 
ence between two sets of observations and the SD for this 
difference. The closer the mean difference was to 0 and 
the smaller the SD of this difference, the better was the 
agreement. The differences were depicted in relation to 
the mean values. The 95% limits of agreement were 
defined as the mean difference between the raters ±1.96 x 
SD of the difference- Furthermore, agreement within ±1 TPs 
and ±3 TPs was calculated. 

To determine whether a real change in outcome has 
occurred in clinical practice and research, a change must 
be at least the smallest detectable difference (SDD) of a 
measurement procedure. 25 The SDD was calculated as 
1.96 x ^/(2 x SEM 2 ), where the SE of measurement 
(SEM) was defined as SD of the difference/ V^- SDD was cal- 
culated and rounded up to the nearest whole number. 

Cronbach's a is a measure of internal consistency indi- 
cating if different items of a test battery are intercorre- 
lated and measure the same construct. Values >0.9 are 
considered excellent. 

The reliability of each TP location was measured by 
k statistics. 
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Called and invited to join the study: 
83 patients 



Declined participation: 
40 patients 



Accepted and completing Day 1 : 43 patients 



Excluded: 1 patient 
(hospitalized, pain 
medicine changed) 



Dropped out: 3 patients 



Completing Day 1 and Day 2: 39 patients 



Figure 2 Flow chart. 
RESULTS 

Eighty-three patients were invited to join the study, and 
39 patients completed both test days (figure 2). Four 
patients dropped out from days 1 to 2, three without 
explanation, and the fourth was excluded because of 
hospital admission and change of pain medication 
between the two test days. Pain medication was 
unchanged in the other patients. 

Baseline characteristics are displayed in table 1. 



Intrarater reliability and agreement 

The mean TP count was seven and differed little 
between test days (table 2). The ICC in Rater A was 
excellent, 0.83 (95% CI 0.69 to 0.98), reflecting a high 
degree of reliability. ICC was somewhat lower, but still 
good in Rater B, 0.72 (CI 0.49 to 0.95). The relations 
between TP counts on days 1 and 2 are graphically dis- 
played in figure 3 (left panel). The circles representing 
more than one observation were all located near the 
equality lines, and the observations were distributed over 
the whole range of TP counts. 



Table 1 Baseline characteristics 


Variables 


Sex (men/women) 


21/18 


Age (mean, range) 


42.0 (24-58) 


Back+leg pain (0-60, median, range) 


22 (2-50) 


Disability (0-23, median, range) 


14(0-23) 


Tender points* (0-18, median, range) 


8 (0-18) 


Duration of pain (n, %) 




3-6 months 


13 (33) 


7-12 


12 (31) 


>12 


14(36) 


Back+leg pain measured as the sum of worst, average and actual 


pain. 




Disability estimated by the Roland Morris Questionnaire, and 


tender points estimated by standardised digital palpation. 


*Median tender points of Observer A on day 1 : 


men 5, women 







In about half of the observations, agreement was 
within ±1 TP. For both raters, more than 75% of the TP 
counts were within ±3 TPs in both sexes. The limits of 
agreement were within ±4 and ±6 TPs for Rater A and 
Rater B, respectively (figure 3 right panel), correspond- 
ing to the SDD (table 2). Measurement errors (SEM) 
were 1.34 (1.90/ ^2) and 1.89 (2.68/ V2) for Rater A 
and Rater B, respectively. Cronbach's a was 0.96 and 
0.92 for Rater A and B, respectively. 

Inter-rater reliability and agreement 

The mean differences of TP counts differed little 
between the two raters (table 3). The relations between 
TP counts of Raters A and B are shown in figure 4, left 
panel, and the limits of agreement in the right panel. 
The circles representing more than one observation 
were all located near the equality and zero lines. On 
both test days, ICC was higher than 0.75. In more than 
70% of the cases, Rater B agreed with Rater A regarding 
±3 TPs in both men and women. The limits of agree- 
ment were within ±5 TPs, corresponding to SDD of 5 
TPs. measurement errors (SEM) were 1.63 (2.30/V2) 
and 1.47 (2.08/ \/2) on days 1 and 2, respectively. 
Cronbach's a was 0.94 and 0.96 on days 1 and 2, 
respectively. 

Reliability of testing each TP location 

In the appendix is shown the reliability of testing each 
TP location. Agreement varied from 69% to 90%, and k 
values varied from 0.13 to 0.89. 



DISCUSSION 

The present study showed that digital TP examination 
resulted in total TP counts with acceptable-to-excellent 
reliability when calibration of the thumbs with a dolorim- 
eter was performed before the testing. This indicated 
that the measurement error, which was less than 2 TPs, 
was considerably smaller than the variation between indi- 
viduals. The lesser experienced Rater B did not perform 
as well as the more experienced Rater A, and this was 
especially evident on comparison of the lower limits of 
the CIs. However, the reliability of Rater B was acceptable, 
but more training and regular use would probably 
improve the results. Training has been shown to reduce 
the variability in applying a 4 kg digital force. 29 

Agreement is independent of the variation between 
subjects. We consider an agreement of more than 70% 
as good, and it was found for ±3 TPs in both men and 
women, indicating that digital TP examination in daily 
practice may be used, keeping in mind the uncertainty 
of ±3 TPs. This part of the result was especially import- 
ant, since we found that TP counts were higher in 
women than in men, in line with other studies. In the 
general population, TP counts of more than 10 and 6 
have been identified in 10-20% of women and men, 
respectively. 6 7 Thus, a TP count of 9 may be normal in 
women, but high in men. 
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Table 2 Intrarater differences, reliability and agreement 






Agreement (%) 






Intraobserver 


Intraclass 


„ -4 TO A || 

±1 TP all 


. o TO mII 

±3 TP all 






Day 1 Day 2 difference mean 


correlation 


men 


men 


Limits of 




mean (SD) mean (SD) (SD) 


coefficient (CI) 


women 


women 


agreement 


SDD* 


uoserver a /.^o(4.oi) / .Uo (4.yo) — u.io(i.yu) 


C\ OO IC\ RC\ f\ C\Q\ 

u.oo (U.by to u.yo) 


62 


95 


O CC- O DC 

— o.bo, o.yb 


4 






62 


90 










61 


100 






Observer B 7.10 (4.73) 7.41 (5.78) 0.31 (2.68) 


0.72 (0.49 to 0.95) 


49 


85 


-5.05; 5.66 


6 






62 


90 










33 


78 






Reliability estimated by the intraclass correlation coefficient. 












*Smallest detectable difference. 












SDD, smallest detectable difference; TP, tender points. 













The median TP count of 8 was elevated as compared 
with the median TP count in the general population, 
which is between 3 and 6 TPs. 6 Previously, it has been 
shown that TP counts were elevated in regional pain 
conditions as compared with pain-free controls, but 
lower than in fibromyalgia. 30 

However, SDD ranged from 4 to 6, indicating less pre- 
cision of TP examination than reliability. Thus, accord- 
ing to the present study, TP examination may result in 



TP counts that may differentiate between high, inter- 
mediate or low levels, but not between different levels in 
the low or high range. Moreover, TP examination — as 
used in the present study — would not be sufficiently 
precise to differentiate between patients with higher or 
lower TP counts than 10/11 TPs such as are used in the 
diagnosis of fibromyalgia. 

Accordingly, an SDD of 4-6 was not impressive, but it 
was not so different from other measures in LBP The 





Tender point count by Rater B on Day 1 Average Rater B 

Figure 3 Intrarater reliability and agreement. Reliability with lines of equality shown in the left panel. Agreement shown by 
Bland-Altman plots in the right panel displaying differences of tender point (TP) counts on the y-axis and average of TP counts 
on the x-axis. The upper and the lower horizontal lines represent 95% limits of agreement. Areas of the circles are proportional to 
the number of observations. 
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Table 3 Interrater differences, reliability and agreement 










Agreement (%) 










Interobserver 




±1 TP all 


. o TB -II 

±3 TP all 






Observer A 


Observer B 


difference mean 


Intraclass correlation 


men 


men 


Limits of 




mean (SD) 


mean (SD) 


(SD) 


coefficient (CI) 


women 


women 


agreement 


SDD* 


Day 1 7.23 (4.61) 


7.10 (4.73) 


-0.13 (2.30) 


n ~7"7 /r\ co r\ c\~7\ 

U. / / (U.oo to u.y/) 


59 


85 


A d A- A "70 

— 4.D4, 4./^: 


c 
D 










67 


95 














50 


72 






Day 2 7.08 (4.95) 


7.41 (5.78) 


0.33 (2.08) 


0.84 (0.70 to 0.99) 


56 


87 


-3.83; 4.50 


5 










57 


90 














56 


83 






Reliability estimated by the intraclass correlation coefficient. 












*Smallest detectable difference. 














SDD, smallest detectable difference; TP, tender points. 













minimal detectable change, which is defined closely to 
SDD, 25 31 has been shown to be 4-5 points in the 
Roland Morris Questionnaire, 32 a commonly used instru- 
ment in LBP. 

In fibromyalgia, the peripheral sensory thresholds are 
normal, but pain processing is augmented, primarily 
due to dysfunction of the descending pain inhibition 
system in the brainstem. 33 In the present study, the 
patients were sick-listed because of chronic LBP, and we 



have previously presented data making it plausible that 
LBP can partly be explained by mechanisms similar to 
those seen in fibromyalgia patients. 10 

We found high internal consistency, as all of Cronbach's 
a values were above 0.90. This may support the assumption 
that TP counts measure the same construct, that is, insuffi- 
cient pain inhibition, rather than local abnormality. 
Therefore, in chronic LBP patients, TPs may be inter- 
preted as follows: a high TP count may indicate an 





Figure 4 Inter-rater reliability and agreement. Reliability with lines of equality shown in the left panel. Agreement shown by 
Bland-Altman plots in the right panel displaying differences of tender point (TP) counts on the y-axis and the average of TP 
counts on the x-axis. The upper and the lower horizontal lines represent 95% limits of agreement. Areas of the circles are 
proportional to the number of observations. 
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insufficiently functioning descending pain inhibition 
system, whereas a low TP count may indicate a well- 
functioning system. TP counts in the middle of the distri- 
bution are inconclusive. The present study does not 
provide sufficient data to set limits for high or low TP 
counts in LBP patients. 

In the present chronic LBP population, there was no sig- 
nificant change in TP counts during 1 week. We could 
have chosen a shorter or longer interval, but 1 week was 
chosen for pragmatic reasons, because we assumed that 
1 week would not be too long in a patient population with 
long-lasting pain. One might expect more change in TP 
counts during 1 week in patients with acute LBP. A system- 
atic difference in TP count between the first and second 
TP examinations might have occurred, but such a poten- 
tial difference was not apparent because the raters were 
randomised to be either the first or second rater. 

The value of TP examination has been questioned. 
First, the examination method may be unreliable, 
because the pain response may be affected by expecta- 
tions 1 or distress. 34 When the examination is performed 
randomly with the patient blinded for the pressure gradi- 
ent, the results are different as compared with non- 
blinded testing. 34 35 Second, it may be inadequate to use 
a sharp cut-point (>11 TPs) to distinguish health from 
disease in pain conditions. 36 At present, fibromyalgia is 
considered part of a larger continuum. 37 38 Third, there 
have been problems with implementation of the examin- 
ation technique, especially in primary care. Often, it has 
been incorrectly performed, and some physicians have 
refused to use the method. 39 

Therefore, new criteria for diagnosing fibromyalgia 
have been developed and validated. These criteria do not 
include TP examination, and therefore they will enable 
clinicians and researchers to diagnose fibromyalgia by 
surveys. However, the new criteria were not meant to 
replace the original American College of Rheumatology 
(ACR) criteria, but to represent an alternative method of 
diagnosis 39 ; and the new criteria have not been tested in 
rheumatic conditions and may not be relevant in patients 
with inflammatory rheumatic diseases. In these condi- 
tions, fibromyalgia symptoms may be caused by rheum- 
atic disease and not by dysfunction of the descending 
pain inhibition system. Therefore, TP examination will 
still be relevant both at present and in the future. 

The reliability of testing each TP location was not dif- 
ferent from previous reporting in the literature. 13-15 

Strengths 

The present study was conducted in a well-defined popu- 
lation recruited by general practitioners on the basis of 
sick-listing due to LBP, and all had chronic LBP. TPs 
were normally distributed, making it possible to analyse 
data with parametric methods. 

Weaknesses 

The number of patients was small, resulting in wide CIs 
of ICC, and only two raters participated. If more raters 
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had participated, the results would have been more 
generalisable. 

Perspectives 

The possible advantages of using TP examination in 
LBP patients include ease and speed, no requirements 
of equipment and good reliability and agreement. 
Furthermore, malingering or appealing distress will 
probably not induce bias in LBP patients, who do not 
know what to prefer, many or few TPs. 

The possible disadvantages include lack of precision and 
the need for training and equipment (dolorimeter). 

We need to know more about the variability of the TP 
count over time, and we need reproducibility studies 
comparing TP counts with other measures of dysfunc- 
tion of the descending pain-inhibiting system. 37 As an 
example, lack of cold tolerance has been documented 
in whiplash patients with prolonged symptoms. 40 TP 
counts may be compared with cold tolerance. 

Furthermore, it would be interesting to see reliability 
and agreement studies of the total TP count in fibromyal- 
gia patients and patients with inflammatory rheumatic dis- 
eases. Findings resembling the results of the present study 
may have implications for the fibromyalgia criteria. 

Conclusion 

Digital TP examination in sick-listed chronic LBP 
patients was a reliable but not precise instrument. More 
reliability and agreement studies are needed in LBP 
patients and other populations, including patients with 
inflammatory rheumatic diseases. 
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